2 research outputs found
Recommended from our members
Improving the quality of bug data in software repositories
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.Context : Researchers have increasingly recognised the benefit of mining software repositories
to extract information. Thus, integrating a version control tool (VC tool) and bug tracking
tool (BT tool) in mining software repositories as well as synchronising missing bug tracking
data (BT data) and version control log (VC log) becomes of paramount importance, in order
to improve the quality of bug data in software repositories. In this way, researchers can do
good quality research for software project benefit especially in open source software projects
where information is limited in distributed development. Thus, shared data to track the issues
of the project are not common. BT data often appears not to be mirrored when considering
what developers logged as their actions, resulting in reduced traceability of defects in the
development logs (VC logs). VC system (Version control system) data can be enhanced with data from bug tracking system (BT system), because VC logs reports about past software development activities.
When these VC logs and BT data are used together, researchers can have a more complete
picture of a bug’s life cycle, evolution and maintenance. However, current BT system and
VC systems provide insufficient support for cross-analysis of both V Clogs and BT data for
researchers in empirical software engineering research: prediction of software faults, software
reliability, traceability, software quality, effort and cost estimation, bug prediction, and bug
fixing.
Aims and objectives: The aim of the thesis is to design and implement a tool chain to
support the integration of a VC tool and a BT tool, as well as to synchronise the missing VC
logs and BT data of open-source software projects automatically. The syncing process, using
Bicho (BT tool) and CVSAnalY (VC tool), will be demonstrated and evaluated on a sample
of 344 open source software (OSS) projects.
Method: The tool chain was implemented and its performance evaluated semi-automatically.
The SZZ algorithm approach was used to detect and trace BT data and VC logs. In its formulation, the algorithm looks for the terms "Bugs," or "Fixed" (case-insensitive) along with the ’#’ sign, that shows the ID of a bug in the VC system and BT system respectively. In i addition, the SZZ algorithm was dissected in its formulation and precision and recall analysed for the use of “fix”, “bug” or “# + digit” (e.g., #1234), was detected was detected when tracking possible bug IDs from the VC logs of the sample OSS projects.
Results: The results of this analysis indicate that use of “# + digit” (e.g., #1234) is more
precise for bug traceability than the use of the “bug” and “fix” keywords. Such keywords are
indeed present in the VC logs, but they are less useful when trying to connect the development
actions with the bug traces – that is, their recall is high. Overall, the results indicate that
VC log and BT data retrieved and stored by automatic tools can be tracked and recovered
with better accuracy using only a part of the SZZ algorithm. In addition, the results indicate
80-95% of all the missing BT data and VC logs for the 344 OSS projects has been synchronised
into Bicho and CVSAnalY database respectively.
Conclusion: The presented tool chain will eliminate and avoid repetitive activities in
traceability tasks, as well as software maintenance and evolution. This thesis provides a
solution towards the automation and traceability of BT data of software projects (in particular,
OSS projects) using VC logs to complement and track missing bug data.
Synchronising involves completing the missing data of bug repositories with the logs de
tailing the actions of developers. Synchronising benefit various branches of empirical software
engineering research: prediction of software faults, software reliability, traceability, software
quality, effort and cost estimation, bug prediction ,and bug fixing